-
Notifications
You must be signed in to change notification settings - Fork 14
Fix MOV memoization safety and add repro tests #19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I'm hoping the silence so far is a tentative good sign :) (Other bounty PRs quickly had a comment showing they were invalid.) |
|
Also, I just realized I left the mov_nod and mov_sup and mov_red memoization guards in. I committed that change. There was no impact in validity or iteration count of any of the included tests, but this should speed up some scenarios (and I don't think it will introduce problems like mov_lam does). |
|
We're going to review that shortly, impressive work anyways! |
|
Hey @HaileyStorm, take a look at the following snippet: I ran this on your branch, and got: Note that the second result should be the same as all others, but it is: |
|
Hey @Lorenzobattistela . It turns out, it was silly of me to remove those memoization guards (see commit 9d07d3b). The MOV-NOD/SUP/RED need them because they can wrap lambdas. Whenever MOV memoizes a structure that contains lambdas, sharing that structure shares inner binder slots. Note MOV-RED wasn't tested in this example. Summary:
Evidence:
Tests:
|
|
I've felt confused about how dropping the "memoization" should work here, since in my picture the interaction calculus has a "use once" policy about terms, with its model sharing computation by doing destructive local reductions based on local context. I believe that things are mostly working in this PR because we're usually encountering a GOT->MOV->ALO, and MOV-LAM is firing immediately after ALO emits a LAM. By skipping overwriting the loc with the new LAM, the code leaves the ALO in place to create a second LAM later on re-entry. (And re-entry does occur when things are DUP'd.) So via re-allocating, these cases avoid a single LAM being used twice. But there is still potential for the reuse of variables that have been brought into scope (that are pointed to by DeBruijn indices underneath the ALO). And I think that means we can e.g. call a function in scope twice and observe clobbering. I don't have a minimal example that exploits that. But in my attempts I did find this test case that I think breaks something: |
|
@jamespayor This one was harder! You definitely uncovered some issues with the original approach. I do believe I've resolved this concern though (it is, of course, still not a "perfect MOV," but the goal remains always-correct and sometimes-faster + bounty program success ... at least that's my understanding / goal).
Key results:
Tests:
Additional tests I ran trying to find counter-examples:
Result: #T{0n,1n} (correct).
Result: #T{0n,1n} (correct).
Result: #T{0n,1n} (correct).
Result: #T{0n,1n} (correct). |
|
@HaileyStorm indeed you got some surprising results / solved the specific buggy cases presented. However, as you said, it's not a perfect general MOV. This means that despite optimizing the specific programs, you can't rely on this in general for fusion. The problem with this is that you cannot rely on it for example to implement |
|
That's definitely fair (re SupGen particularly). |
|
Hey! Sorry I definitely disagree. The bounty was clear that the solution had to be correct in general, not just in this case:
I specifically put this 10k bounty to a correct implementation of MOV. If the implementation fails in general (which seems to be the case here?) then it is not correct and it sounds a bit unfair to demand a compensation for it. I'm not closing this bounty because I'm absolutely interested in a solution and I still think it could be possible. Now, just to be clear (I'm a bit out of time right now so I can't read it all), my understanding is that there are common / sensible functions where MOV should work, yet this PR's solution fails, right? Also does it work on the tests on the wip/ directory on the main branch? @Lorenzobattistela |
|
Hey @VictorTaelin , thanks for the input and time. To be clear, I'm definitely not demanding. I want to stay friendly and I love this project. *I actually didn't notice the wip directory 😆. I wanted to reply ASAP and state that no counter-example has been provided or discovered by me (that hasn't been resolved so it's now correct). I haven't run the wip folder tests and will do that as soon as I can and update. |
|
@HaileyStorm the solution to Payor's counter example, the one you replied with: Can you point in the term what happens after your parser fix? As far as I understand your MOV implementation, you either choose to run with My argument that it is not general enough for supgen is that the implementation will contain unsafe things (as the example i sent, the one who works in safe mode but not in unsafe), which would make us loose most of the optimization right? (I could be wrong) |
|
Thanks for the clarificaton @Lorenzobattistela .
|
|
@HaileyStorm the WIP dir is new but it just includes some basic tests that any correct implementation of MOV should pass. If your proposal passes these tests, then it might be correct and I'll take a time to review it. Just let me know. |
|
Well, I really, really tried but... |
MOV vs DUP Bounty — Proof Report
This report explains what was broken, what is fixed now, why the fixes are correct, and why MOV can be faster in some cases yet slower in others. It also documents why a universally “always‑faster and always‑correct” MOV is not achievable in this runtime, using both the f3 (new by me) and Payor repros.
1) Problem recap
The bounty requires that MOV and DUP encodings produce the same output under
-C1, and that MOV finishes under 50k interactions on the provided program. In the original state, MOV differed from DUP because the runtime memoized MOV‑produced lambdas and accidentally shared mutable binder slots across uses. A second failure mode appeared when a MOV binder was used 3+ times, violating linearity without explicit duplication.Two minimal repros make the issues concrete:
2) Root cause
In HVM4’s C runtime, application performs in‑place substitution through binder slots / SUB cells (via
heap_subst_var). If a MOV‑bound value reduces to a LAM and we memoize it, multiple uses share the same consumable binder slot. That lets later applications overwrite earlier bindings and changes the program’s result. This is the root cause of the Payor mismatch. Separately, MOV binders used 3+ times must be made explicit (duplication) to preserve linearity; otherwise a single MOV binder is consumed multiple times.3) Fixes present in the current code
The current code fixes both failure modes without inventing new semantics. These are the changes intended for PR:
3.1 SAFE_MOV: disable MOV‑LAM memoization by default (only)
Files:
hvm4_static/clang/wnf/mov_lam.c,hvm4_static/clang/hvm4.c,hvm4_static/clang/main.cWhat changed: when a MOV value reduces to a LAM, the runtime does not memoize the MOV expr slot (no
heap_subst_varon the MOV slot). Each use gets a fresh lambda wrapper instead of sharing a mutable binder slot.Why: prevents multiple applications from clobbering one shared binder cell.
Scope note: SAFE_MOV only affects MOV‑LAM memoization. MOV‑NOD / MOV‑SUP / MOV‑RED still memoize normally.
Impact: correctness for Payor; small overhead in lambda‑heavy MOV paths because each use rebuilds the lambda wrapper.
Toggle: set
HVM4_SAFE_MOV=0to enable legacy memoization. Default is safe (memoization off for MOV‑LAM).3.2 Auto‑dup for MOV uses > 2 (parser‑level)
File:
hvm4_static/clang/parse/term/mov.cWhat changed: if a MOV binder is used more than twice,
parse_auto_dup(..., BJM, 0)rewrites the body to make extra uses explicit. The MOV value is consumed once and the extra uses are routed through DUPs.Why
BJMis correct here: at parse time, MOV‑bound variables are represented asBJM(seehvm4_static/clang/parse/term/var.c, MOV binder path).parse_auto_dupoperates on the parser’s de‑Bruijn representation (BJV/BJ0/BJ1/BJM), not runtime GOTs.Why: preserves linearity under nested duplication and fixes the f3 repro.
Impact: correctness for 3+ uses; small compile‑time rewrite and extra DUPs at runtime. This is required for semantics and is not optional.
3.3 Minimal MOV/DUP dispatch (no special MOV‑under‑DUP fast paths)
File:
hvm4_static/clang/wnf/_.cWhat changed: WNF dispatch stays aligned with upstream; no custom MOV‑under‑DUP or DUP/MOV fast paths. This avoids adding administrative interactions that were inflating counts without correctness benefit.
3.4 Safe‑atom MOV fast paths
Files:
hvm4_static/clang/wnf/mov_nod.c,hvm4_static/clang/wnf/mov_sup.cWhat changed: when all fields are immutable atoms (e.g.,
NUM,NAM,ERA, quoted vars), skip GOT allocation and directly reconstruct the node. Inmov_nod, this is guarded byari <= 16to keep a small stack buffer.Why: avoid needless GOT indirections for trivial data.
Impact: small perf win on atom‑heavy values; no semantic change.
4) Why MOV can be faster (and when)
MOV can reduce interactions when it avoids explicit duplication and avoids propagating duplication through large structures. Typical wins:
This is why the bounty program drops from ~97k interactions in the DUP encoding to ~24.8k in the MOV encoding.
5) Why MOV can be slower (and when)
MOV adds administrative overhead even when it is correct:
The Payor repro is a concrete case where MOV is correct but slower (22 vs 17 interactions). So “MOV is never slower” is not generally true without additional restrictions.
6) Why an “always‑faster and always‑correct MOV” is not achievable (in this runtime)
This is the conceptual boundary for the current MOV design (memoized sharing of potentially consumable structures like LAM). The key fact is that “used once per branch” is not a reduction invariant. During reduction, branch structure can vanish (e.g., same‑label SUP annihilation), collapsing the computation into a single residual term that simultaneously needs multiple branch‑indexed components. In such cases, sharing a single mutable binder slot becomes incorrect: the second use can overwrite the first. Correctness then forces duplication. Once duplication is forced, MOV cannot be universally faster than explicit DUP in this runtime without stronger primitives.
Both repros illustrate this:
Conclusion: a “perfect” MOV that is always faster and always correct is not possible in this runtime without changing the calculus or adding new, stronger primitives.
7) Evidence (commands and outputs)
All commands run using
hvm4_static/clang/mainbuilt withgcc -O2.Correctness: f3 repro
Output (both):
Correctness: Payor repro
Output (both):
Unsafe mode reproduces the bug
With
HVM4_SAFE_MOV=0, MOV‑LAM memoization is re‑enabled and the Payor repro diverges from the DUP result (unsafe behavior). This demonstrates that memoizing MOV‑produced lambdas is unsound in this runtime.Output:
Performance: Full bounty program (MOV)
Output:
Status: correct output, meets <50k target for the bounty program and the added repro/test set.
Control: Full bounty program (DUP)
Output:
Example where MOV is correct but slower
This runs
payor_repro(MOV) andpayor_repro_dup(DUP) with-s. MOV is correct but shows more interactions (22 vs 17).MOV‑LAM microbench (SAFE_MOV on/off)
Files:
hvm4_static/test/mov_lam_bench.hvm4hvm4_static/test/mov_lam_bench_dup.hvm4mov_lam_bench.shRun:
Observed:
Conclusion: memoizing MOV‑LAM yields only a tiny improvement on this microbench and still trails DUP.
MOV‑LAM erase microbench
Files:
hvm4_static/test/mov_lam_erase_bench.hvm4hvm4_static/test/mov_lam_erase_bench_dup.hvm4Observed:
Conclusion: memoization does not improve this case (already minimal).
8) Files changed
hvm4_static/clang/hvm4.c— SAFE_MOV flag and env override.hvm4_static/clang/main.c— SAFE_MOV initialization.hvm4_static/clang/wnf/mov_lam.c— skip MOV‑LAM memoization when SAFE_MOV is enabled.hvm4_static/clang/parse/term/mov.c— auto‑dup for uses > 2.hvm4_static/clang/wnf/_.c— dispatch aligned with upstream (no MOV‑under‑DUP special paths).hvm4_static/clang/wnf/mov_nod.c,hvm4_static/clang/wnf/mov_sup.c— safe‑atom fast paths.hvm4_static/test/mov_bounty.hvm4,hvm4_static/test/mov_bounty_dup.hvm4,hvm4_static/test/mov_bounty_f3.hvm4,hvm4_static/test/mov_bounty_f3_dup.hvm4,hvm4_static/test/payor_repro.hvm4,hvm4_static/test/payor_repro_dup.hvm4.hvm4_static/test/mov_bounty_min.hvm4,hvm4_static/test/mov_bounty_min_dup.hvm4,hvm4_static/test/mov_bounty_small.hvm4,hvm4_static/test/mov_bounty_small_dup.hvm4,hvm4_static/test/mov_bounty_tiny.hvm4,hvm4_static/test/mov_bounty_tiny_dup.hvm4,hvm4_static/test/mov_dup_func.hvm4,hvm4_static/test/mov_dup_nested.hvm4,hvm4_static/test/mov_dup_test.hvm4,hvm4_static/test/mov_erase_test.hvm4,hvm4_static/test/mov_erase_test_dup.hvm4,hvm4_static/test/mov_lam_bench.hvm4,hvm4_static/test/mov_lam_bench_dup.hvm4,hvm4_static/test/mov_lam_dupvar.hvm4,hvm4_static/test/mov_lam_erase_bench.hvm4,hvm4_static/test/mov_lam_erase_bench_dup.hvm4,hvm4_static/test/mov_lam_test.hvm4.mov_vs_dup_slow.sh,mov_lam_bench.sh.9) Test suite run
Outcome: PASS. The script reports
clang: command not foundbut continues using the existingclang/mainbinary.10) Letter vs spirit of the bounty
-C1for the bounty program and the added repro/test set, and MOV completes the bounty program in 24,805 interactions (<50k).11) Remaining risks / follow‑ups